Assignment: 2021 VAST Mini Challenge 2

The Kronos Incident

Connie Xia https://example.com/conniexia
07-25-2021

Introduction

This Visual Analytics Assignment is based on the VAST Challenge 2021 Mini Challenge 2. The overview of the Challenge is about a Tethys-based GAStech that has been operating a natural gas production site in the island country og Kronos. Despite bringing in remarkable profits and developing strong relationships with the Kronos government, GAStech has not been as successful in demonstrating environmental stewardship.

In January 2014, while the leaders of GAStech were celebrating their successful initial public offering (IPO), several of the company’s employees go missing. It is suspected that an organisation called the Protectors of Kronos (POK) was involved in the employees’ disappearance, but things may not be what they seem.

The Mini Challenge

Mini Challenge 2 is about analysing the movement and tracking data of the GAStech employees, as well as their card card transactions and loyalty card usage data. From which, any anomalies and suspicious behaviours of the GAStech employees will be identified and analysed.

Literature Reviews

Here.

Data Sources

The data sources used in the mini challenge are:

Data sources Description
car-assignments.csv List of vehicle assignments by employee
Geospatial folder ESRI shapefiles of Abila and Kronos
gps.csv Vehicle tracking data
loyalty_data.csv Loyalty card transaction data
cc_data.csv Credit and debit card transaction data
MC2-Tourist.jpg Tourist map of Abila with locations of interest identified

R Packages Used

Below are the list of R packages installed and used in this assignment.

packages = c('raster', 'sf', 'tmap', 'lubridate', 
             'tidyverse', 'igraph', 'tidygraph', 
             'ggraph', 'visNetwork', 'clock',
             'DT', 'zoo', 'parcoords')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Location Analysis

Data Preparation

The credit card and loyalty card csv files are first loaded into R.

cc_data <- read_csv("./data/cc_data.csv")
loyalty_data <- read_csv("./data/loyalty_data.csv")

Taking a look at the two datasets, we can observe that the timestamp, location and loyaltynum are in character field while price and last4ccnum are in numerical field. It is to be noted that the timestamp field should be in date-time format.

glimpse(cc_data)
Rows: 1,490
Columns: 4
$ timestamp  <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp  <chr> "1/6/2014", "1/6/2014", "1/6/2014", "1/6/2014", "~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

Therefore, we need to convert the timestamp field in both datasets from character type to date-time type.

cc_data$timestamp <- date_time_parse(cc_data$timestamp,
                                     zone = "",
                                     format = "%m/%d/%Y %H:%M")

loyalty_data$timestamp <- date_time_parse(loyalty_data$timestamp,
                                          zone = "",
                                          format = "%m/%d/%Y")

Now, all the fields are in their correct data type.

glimpse(cc_data)
Rows: 1,490
Columns: 4
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty_data)
Rows: 1,392
Columns: 4
$ timestamp  <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

In order to have a better understanding and visualisation on how the credit card and loyalty card usage are related, we would need to join both datasets together.

First, it is noted that the timestamp column in the cc_data dataset has both dates and time, while the timestamp column in the loyalty_data dataset only has the date field. So, we need to extract out the date from the timestamp column of the cc_data.

cc_data$date <- format(cc_data$timestamp, "%m/%d/%Y")
cc_data$date <- date_time_parse(cc_data$date, 
                                zone = "", 
                                format = "%m/%d/%Y")

head(cc_data)
# A tibble: 6 x 5
  timestamp           location    price last4ccnum date               
  <dttm>              <chr>       <dbl>      <dbl> <dttm>             
1 2014-01-06 07:28:00 Brew've Be~ 11.3        4795 2014-01-06 00:00:00
2 2014-01-06 07:34:00 Hallowed G~ 52.2        7108 2014-01-06 00:00:00
3 2014-01-06 07:35:00 Brew've Be~  8.33       6816 2014-01-06 00:00:00
4 2014-01-06 07:36:00 Hallowed G~ 16.7        9617 2014-01-06 00:00:00
5 2014-01-06 07:37:00 Brew've Be~  4.24       7384 2014-01-06 00:00:00
6 2014-01-06 07:38:00 Brew've Be~  4.17       5368 2014-01-06 00:00:00
head(loyalty_data)
# A tibble: 6 x 4
  timestamp           location            price loyaltynum
  <dttm>              <chr>               <dbl> <chr>     
1 2014-01-06 00:00:00 Brew've Been Served  4.17 L2247     
2 2014-01-06 00:00:00 Brew've Been Served  9.6  L9406     
3 2014-01-06 00:00:00 Hallowed Grounds    16.5  L8328     
4 2014-01-06 00:00:00 Coffee Shack        11.5  L6417     
5 2014-01-06 00:00:00 Hallowed Grounds    12.9  L1107     
6 2014-01-06 00:00:00 Brew've Been Served  4.27 L4034     

Now, we can join both datasets together by the date, price and location. A new column, hour, is added to take note of the time period the employee visited the location.

cc_loyalty_data <- left_join(cc_data, loyalty_data,
                             by = c("date" = "timestamp",
                                    "location" = "location",
                                    "price" = "price"))

cc_loyalty_data$hour = hour(cc_loyalty_data$timestamp)
  
glimpse(cc_loyalty_data)
Rows: 1,496
Columns: 7
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ date       <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ loyaltynum <chr> "L8566", NA, "L8148", "L5553", "L3800", "L2247", ~
$ hour       <int> 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7~

Data Visualisation

Exploratory data analysis is conducted to determine the most popular locations frequented by the employees of GAStech.

Looking at the combined dataset, we are able to observe that the top five most popular locations are:

  1. Katerina’s Cafe
  2. Hippokampos
  3. Guy’s Gyros
  4. Brew’ve Been Served
  5. Hallowed Grounds
freq_by_location <- cc_loyalty_data %>%
  select(location, price) %>%
  group_by(location) %>%
  summarise(total_price = sum(price), freq = n()) %>%
  arrange(desc(freq)) %>%
  ungroup()

ggplot(data=freq_by_location,
       aes(x = reorder(location, -freq), y = freq)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 90,
                                   hjust = 1,
                                   vjust = 0.5)) + 
  ggtitle("Frequency of Visit") +
  labs(y = "Frequency", x = "Location")

Delving deeper into the top 5 most frequented places over the span of two weeks, we can plot a calendar heatmap to show the frequencies of purchases made at the top 5 most popular locations.

freq_by_date <- cc_loyalty_data %>%
  select(date, hour, location, price) %>%
  group_by(date, hour, location) %>%
  summarise(total_price = sum(price), freq = n()) %>%
  arrange(desc(freq)) %>%
  ungroup()

freq_by_date$weekday = as.POSIXlt(freq_by_date$date, format = "%d/%m/%Y")$wday

freq_by_date$weekdayf <- factor(freq_by_date$weekday, 
                                levels = rev(0:6),
                                labels = rev(c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")),
                                ordered = TRUE)

top5_plot <- freq_by_date %>%
  na.omit() %>%
  filter(location %in% c("Katerina's Cafe", "Hippokampos", "Guy's Gyros", "Brew've Been Served", "Hallowed Grounds"))

top5_heatmap <- ggplot(top5_plot, aes(hour, weekdayf, fill = top5_plot$freq)) +
  geom_tile(colour = "white") +
  facet_grid(~location) +
  theme(strip.text = element_text(size = 7)) +
  scale_fill_distiller(palette = "Reds", direction = 1) + 
  xlab("Hour") + 
  xlim(c(0,23)) +
  ylab("Day") + 
  ggtitle("Calendar Heatmap: Transaction Volume at Top 5 Locations") + 
  labs(fill = "Frequency")

top5_heatmap

The following observations and insights can be derived from the calendar heatmap:

  1. All 5 locations are possibly dining places located in Abila inferred from the time period the employees visited those locations.
  2. Brew’ve Been Served is a popular location during the weekdays around breakfast hours (0700-0800). Similarly, Hallowed Grounds is probably another popular breakfast place during the weekdays.
  3. Guy’s Gyros, Hippokampos and Katerina’s Cafe are popular lunch (1300-1400) and dinner (1900-2100) spots throughout the entire week.

Moving on, two other calendar heatmaps are plotted to view the transaction volume comparison between weekdays and weekends.

weekday_plot <- freq_by_date %>%
  na.omit() %>%
  filter(weekday %in% c(1:5))

weekday_heatmap <- ggplot(weekday_plot, aes(hour, location, fill = weekday_plot$freq)) +
  geom_tile(colour = "white") +
  theme(strip.text = element_text(size = 7)) +
  scale_fill_distiller(palette = "Reds", direction = 1) + 
  xlab("Hour") + 
  xlim(c(0,23)) +
  ylab("Location") + 
  ggtitle("Calendar Heatmap: Transaction Volume during Weekdays") + 
  labs(fill = "Frequency")

weekday_heatmap

weekend_plot <- freq_by_date %>%
  na.omit() %>%
  filter(weekday %in% c(0,6))

weekend_heatmap <- ggplot(weekend_plot, aes(hour, location, fill = weekend_plot$freq)) +
  geom_tile(colour = "white") +
  theme(strip.text = element_text(size = 7)) +
  scale_fill_distiller(palette = "Reds", direction = 1) + 
  xlab("Hour") + 
  xlim(c(0,23)) +  
  ylab("Location") + 
  ggtitle("Calendar Heatmap: Transaction Volume during Weekends") + 
  labs(fill = "Frequency")

weekend_heatmap

A boxplot show the spread of the transaction amount made by GAStech employees at different locations.

median_price <- cc_loyalty_data %>%
  group_by(location) %>%
  summarise(median_price = median(price))

spending_by_location <- cc_loyalty_data %>%
  left_join(median_price, 
            by = c("location"))

ggplot(data = spending_by_location,
       aes(x = reorder(location, -median_price), y = price)) +
  geom_boxplot(outlier.colour = "Red", outlier.fill = "Red") +
  geom_point(alpha = 0) +
  scale_y_log10() +
  theme(axis.text.x = element_text(angle = 90,
                                   hjust = 1,
                                   vjust = 0.5)) + 
  ggtitle("Transaction Amount by Location") +
  labs(y = "Spending amount", x = "Location")

Based on the above plots, we are able to spot a few anormalies that are note-worthy.

  1. Kronos Mart

From both the weekday and weekend calendar maps plotted above, there are transactions occurring at that location at the wee hours, when most people would be asleep. Further analysis revealed that there are a total of 5 transactions occurring on 3 different days (12 Jan, 13 Jan and 19 Jan) around 0300-0400 time period. The 5 transactions are all carried out on different credit cards, with no loyalty card used. It is also suspicious that the last two transactions occurred within the span of three minutes, suggesting a likely possibility that the last two card owners might have seen each other in the mart.

knitr::kable(cc_loyalty_data %>%
               filter(location %in% "Kronos Mart" & hour %in% 3) %>%
               dplyr::select(location, timestamp, date, hour, price, last4ccnum, loyaltynum) %>%
               arrange(hour), "simple",
             caption = "Transactions made in Kronos Mart")
Table 1: Transactions made in Kronos Mart
location timestamp date hour price last4ccnum loyaltynum
Kronos Mart 2014-01-12 03:39:00 2014-01-12 3 277.26 8156 NA
Kronos Mart 2014-01-13 03:00:00 2014-01-13 3 147.30 5407 NA
Kronos Mart 2014-01-19 03:13:00 2014-01-19 3 87.66 3484 NA
Kronos Mart 2014-01-19 03:45:00 2014-01-19 3 194.51 9551 NA
Kronos Mart 2014-01-19 03:48:00 2014-01-19 3 150.36 8332 NA
  1. Frydos Autosupply n’ More

Based on the boxplot plotted above, we are able to identify an outlier transaction at Frydos Autosupply n’ More. There is an exceptionally high transaction of 10,000 dollars at the store where the median price is 149.30 dollars. Similarly, this transaction was not tagged to any loyalty card.

knitr::kable(spending_by_location %>%
               filter(location %in% "Frydos Autosupply n' More") %>%
               filter(price %in% max(price)) %>%
               dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum),
             caption = "Transactions made in Frydos Autosupply n' More")
Table 2: Transactions made in Frydos Autosupply n’ More
location timestamp date hour price median_price last4ccnum loyaltynum
Frydos Autosupply n’ More 2014-01-13 19:20:00 2014-01-13 19 10000 149.3 9551 NA
  1. Daily Dealz

Despite being a low transaction amount, this purchase is rather suspicious as that is not tied to any loyalty card. The maximum, median and minimum transaction amounts also seem to be the same value, suggesting that it could be a singular purchase. This also means that not many people make purchases from Daily Dealz over the span of two weeks. Further analysis confirmed the suspicion. Furthermore, the credit card used to make the purchase is same as the credit card used to make the hefty purchase at Frydos Autosupply n’ More. The timing where the transaction occurred is also rather suspicious as it is an early morning purchase.

knitr::kable(spending_by_location %>%
               filter(location %in% "Daily Dealz") %>%
               dplyr::select(location, timestamp, date, hour, price, last4ccnum, loyaltynum) %>%
               arrange(hour), "simple",
             caption = "Transactions made in Daily Dealz")
Table 3: Transactions made in Daily Dealz
location timestamp date hour price last4ccnum loyaltynum
Daily Dealz 2014-01-13 06:04:00 2014-01-13 6 2.01 9551 NA
  1. Albert’s Fine Clothing

Based on the boxplot plotted above, we are able to identify an outlier transaction at Albert’s Fine Clothing. There is large transaction of 1,239.41 dollars compared to the median spending amount of 211.47 dollars.

knitr::kable(spending_by_location %>%
               filter(location %in% "Albert's Fine Clothing") %>%
               filter(price %in% max(price)) %>%
               dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum), 
             caption = "Transactions made in Albert's Fine Clothing")
Table 4: Transactions made in Albert’s Fine Clothing
location timestamp date hour price median_price last4ccnum loyaltynum
Albert’s Fine Clothing 2014-01-17 19:44:00 2014-01-17 19 1239.41 211.47 1321 L4149
  1. Chostus Hotel

Based on the boxplot plotted above, we are able to identify an outlier transaction at Chostus Hotel. There is a higher than average transaction of 600 dollars compared to the median spending amount of 114.22 dollars.

knitr::kable(spending_by_location %>%
               filter(location %in% "Chostus Hotel") %>%
               filter(price %in% max(price)) %>%
               dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum), 
             caption = "Transactions made in Chostus Hotel")
Table 5: Transactions made in Chostus Hotel
location timestamp date hour price median_price last4ccnum loyaltynum
Chostus Hotel 2014-01-18 12:03:00 2014-01-18 12 600 114.22 5010 L2459
  1. Credit Card Number 9551

This credit card is highly suspicious due to the fact that it is used in three of the alleged suspicious transactions pointed out above at Kronos Mart, Frydos Autosupply n’ More and Daily Dealz. Upon inspection of the card transaction detail, we can see that the card owner does have a loyalty card, but it seems like they tend to selectively use their loyalty card for discounts.

Another dubious point to note is that for the first week, they use their credit card daily but during the next week, there are gaps in the transactions. Especially after 13 Jan (Mon), the credit card user stopped using their card for a few days before making transactions on 16 Jan (Thu).

cc_loyalty_data$weekday = as.POSIXlt(cc_loyalty_data$date, format = "%d/%m/%Y")$wday

cc_loyalty_data$weekdayf <- factor(cc_loyalty_data$weekday, 
                                levels = rev(0:6),
                                labels = rev(c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat")),
                                ordered = TRUE)

knitr::kable(cc_loyalty_data %>%
               filter(last4ccnum %in% 9551) %>%
               dplyr::select(timestamp, weekdayf, location, price, last4ccnum, loyaltynum), 
             caption = "Transactions made with CC 9551")
Table 6: Transactions made with CC 9551
timestamp weekdayf location price last4ccnum loyaltynum
2014-01-06 07:55:00 Mon Hallowed Grounds 8.05 9551 L5777
2014-01-06 13:21:00 Mon Hippokampos 28.23 9551 NA
2014-01-06 20:26:00 Mon Albert’s Fine Clothing 276.90 9551 L5777
2014-01-07 07:46:00 Tue Hallowed Grounds 84.44 9551 NA
2014-01-07 13:37:00 Tue Gelatogalore 21.52 9551 L5777
2014-01-08 07:56:00 Wed Hallowed Grounds 12.86 9551 NA
2014-01-08 13:43:00 Wed Hippokampos 39.80 9551 L5777
2014-01-08 21:16:00 Wed Ouzeri Elian 30.81 9551 NA
2014-01-09 07:50:00 Thu Hallowed Grounds 34.45 9551 NA
2014-01-09 13:41:00 Thu Abila Zacharo 89.41 9551 NA
2014-01-10 13:16:00 Fri Ouzeri Elian 30.71 9551 NA
2014-01-11 13:37:00 Sat Hippokampos 75.62 9551 NA
2014-01-11 19:44:00 Sat Shoppers’ Delight 149.20 9551 NA
2014-01-12 14:06:00 Sun Hippokampos 71.99 9551 NA
2014-01-13 06:04:00 Mon Daily Dealz 2.01 9551 NA
2014-01-13 13:18:00 Mon U-Pump 55.25 9551 NA
2014-01-13 13:28:00 Mon Hippokampos 30.51 9551 L5777
2014-01-13 19:20:00 Mon Frydos Autosupply n’ More 10000.00 9551 NA
2014-01-13 19:30:00 Mon Ouzeri Elian 28.75 9551 L5777
2014-01-16 08:05:00 Thu Hallowed Grounds 12.19 9551 L5777
2014-01-16 13:28:00 Thu Guy’s Gyros 10.27 9551 L5777
2014-01-16 20:28:00 Thu Ouzeri Elian 9.91 9551 L5777
2014-01-17 08:04:00 Fri Hallowed Grounds 9.40 9551 L5777
2014-01-17 20:28:00 Fri Ouzeri Elian 35.81 9551 L5777
2014-01-18 13:32:00 Sat Abila Zacharo 16.59 9551 L5777
2014-01-18 19:26:00 Sat Ouzeri Elian 61.56 9551 NA
2014-01-19 03:45:00 Sun Kronos Mart 194.51 9551 NA
2014-01-19 19:49:00 Sun Ouzeri Elian 32.77 9551 NA

Vehicle Data Analysis

Moving on, we will be analysing the vehicle data alongside with the credit and loyalty card transaction data.

First, the raster file, M2-tourist.tif, is imported into R.

bgmap <- raster("./data/MC2-tourist.tif")

Next, the vector GIS data file, Abila, is imported into R.

Abila_st <- st_read(dsn = "./data/Geospatial",
                    layer = 'Abila')
Reading layer `Abila' from data source 
  `C:\Users\conni\Documents\ISSS608_Visual_Analytics&Applications\connieyjx\DataViz_blog\_posts\2021-07-25-assignment-1\data\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84

The vehicle tracking csv data file is also imported into R.

gps <- read_csv("./data/gps.csv")

glimpse(gps)
Rows: 685,169
Columns: 4
$ Timestamp <chr> "1/6/2014 6:28", "1/6/2014 6:28", "1/6/2014 6:28",~
$ id        <dbl> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ lat       <dbl> 36.07623, 36.07622, 36.07621, 36.07622, 36.07621, ~
$ long      <dbl> 24.87469, 24.87460, 24.87444, 24.87425, 24.87417, ~

After taking a look at the gps.csv file, we converted the Timestamp field from character data type to date-time format. The id field is also converted from numerical data type to factor data type.

gps$Timestamp <- date_time_parse(gps$Timestamp,
                                 zone = "",
                                 format = "%m/%d/%Y %H:%M")

gps$day <- as.factor(get_day(gps$Timestamp))

gps$id <- as.factor(gps$id)

The gps dataframe is converted into a simple feature data frame.

gps_sf <- st_as_sf(gps,
                   coords = c("long", "lat"),
                   crs = 4326)

glimpse(gps_sf)
Rows: 685,169
Columns: 4
$ Timestamp <dttm> 2014-01-06 06:28:00, 2014-01-06 06:28:00, 2014-01~
$ id        <fct> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ day       <fct> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
$ geometry  <POINT [°]> POINT (24.87469 36.07623), POINT (24.8746 36~

The gps points are then joined together into movement paths by using the drivers’ IDs as unique identifier.

gps_path <- gps_sf %>%
  group_by(id, day) %>%
  summarise(m = mean(Timestamp),
            do_union = FALSE) %>%
  st_cast("LINESTRING")

glimpse(gps_path)
Rows: 508
Columns: 4
Groups: id [40]
$ id       <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, ~
$ day      <fct> 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,~
$ m        <dttm> 2014-01-06 15:01:40, 2014-01-07 12:40:38, 2014-01-~
$ geometry <LINESTRING [°]> LINESTRING (24.88258 36.066..., LINESTRI~

The stationary points of all cars over the span of 2 weeks is plotted on the map below with the blue dots. The car is deeemd to be stationary if it has stopped for more than 3 minutes.

POI <- gps_sf %>%
  group_by(id) %>%
  mutate(stoptime = Timestamp - lag(Timestamp)) %>%
  mutate(parked = ifelse(stoptime > 60*3, TRUE, FALSE)) %>%
  ungroup() %>%
  filter(parked == TRUE) %>%
  group_by(id, day) %>%
  add_count(id, day, name = "count") %>%
  ungroup()

POI_sf <- POI %>%
  filter(parked == TRUE)
tmap_mode("view")

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(POI_sf) +
  tm_dots(col = "blue")

Going back to the anormalies assessment, we can use the vehicle tracking data to supplement our analysis.

  1. The transaction amount made at Frydos Autosupply n’ More on 13 Jan is highly suspicious and needs to be investigated.

The credit card used to make that purchase ends with 9551. Hence, we can pull out all the transactions made on that card on 13 Jan. 

knitr::kable(cc_loyalty_data %>%
               filter(last4ccnum == 9551 & date == dmy(13012014)) %>%
               dplyr::select(timestamp, location, price, last4ccnum, loyaltynum) %>%
               arrange(timestamp), "simple",
             caption = "Transactions made using CC 9551 on 13 Jan")
Table 7: Transactions made using CC 9551 on 13 Jan
timestamp location price last4ccnum loyaltynum
2014-01-13 06:04:00 Daily Dealz 2.01 9551 NA
2014-01-13 13:18:00 U-Pump 55.25 9551 NA
2014-01-13 13:28:00 Hippokampos 30.51 9551 L5777
2014-01-13 19:20:00 Frydos Autosupply n’ More 10000.00 9551 NA
2014-01-13 19:30:00 Ouzeri Elian 28.75 9551 L5777

On 13 Jan, CC 9551 was used at Daily Dealz, U-Pump, Hippokampos, Frydos and Ouzeri Elian.

The gps path of all vehicles on 13 Jan is plotted for reference. Looking at the map, we can observe that on 13 Jan, only Car ID 24 made a stop at U-Pump. Hence, it is worth investigating is the owner of the Car ID 24 is the same owner of CC 9551.

gps_path_2 <- gps_path %>%
  filter(day==13)
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_path_2) +
  tm_lines()

The stationary points of Car ID 24 on 13 Jan are as follows:

knitr::kable(POI_sf %>%
               filter(id == 24 & day == 13) %>%
               dplyr::select(id, Timestamp, geometry) %>%
               arrange(Timestamp), "simple",
             caption = "GPS Path of Car ID 24 on 13 Jan")
Table 8: GPS Path of Car ID 24 on 13 Jan
id Timestamp geometry
24 2014-01-13 07:32:00 POINT (24.89881 36.06246)
24 2014-01-13 08:07:00 POINT (24.90124 36.05406)
24 2014-01-13 11:16:00 POINT (24.87958 36.04803)
24 2014-01-13 11:46:00 POINT (24.85761 36.07666)
24 2014-01-13 12:31:00 POINT (24.85757 36.07669)
24 2014-01-13 13:22:00 POINT (24.87149 36.06777)
24 2014-01-13 17:57:00 POINT (24.87958 36.04803)
24 2014-01-13 19:29:00 POINT (24.90178 36.05493)
gps_24 <- gps_path %>%
  filter (day == 13 & id == 24)
  
POI_24 <- POI_sf %>%
  filter(id == 24 & day == 13)
tmap_mode("view")

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_24) +
  tm_lines() +
  tm_shape(POI_24) +
  tm_dots(col = "blue")

Observations:

Conclusion: The user of Car ID 24 and the holder of CC 9551 might not be the same person as the card and car records do not match perfectly, making the transactions on CC 9551 very suspicious. We can infer that perhaps the CC 9951 was used by more than one person.

  1. The next anormaly is the transactions made during the wee hours at Kronos Mart.

These five different transactions are all made by different credit cards, suggesting that it might be not a habit of an individual who prefers to grocery shop at night.

knitr::kable(cc_loyalty_data %>%
               filter(location == "Kronos Mart" & hour == "3") %>%
               dplyr::select(timestamp, location, price, last4ccnum, loyaltynum) %>%
               arrange(timestamp), "simple",
             caption = "Transactions made at Kronos Mart during the wee hours")
Table 9: Transactions made at Kronos Mart during the wee hours
timestamp location price last4ccnum loyaltynum
2014-01-12 03:39:00 Kronos Mart 277.26 8156 NA
2014-01-13 03:00:00 Kronos Mart 147.30 5407 NA
2014-01-19 03:13:00 Kronos Mart 87.66 3484 NA
2014-01-19 03:45:00 Kronos Mart 194.51 9551 NA
2014-01-19 03:48:00 Kronos Mart 150.36 8332 NA

Looking at the stationary points of the cars on 12, 13 and 19 Jan, we aren’t able to point out which car is linked to the credit cards used at the mart. The only stationary points appearing near Kronos Mart are:

These timings all do not coincide with the timings in the wee hours.

POI_midnight <- POI_sf %>%
  filter(day == c(12, 13, 19))
tmap_mode("view")

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(POI_midnight) +
  tm_dots(col = "blue")

Conclusion: The credit card holders of 8156, 5407, 3484, 9551 and 8332 either stay a walking distance from the Kronos Mart or they used their own personal cars instead of the company-provide cars to make the travel.

  1. Another anormaly has to do with the large transaction occurring at Albert’s Fine Clothing and Chostus Hotel.

Looking at the Albert’s Fine Clothing suspicious transaction, we can see that the transaction occurred on 17 Jan at 19:44, using CC 1321 and L4149.

knitr::kable(spending_by_location %>%
               filter(location %in% "Albert's Fine Clothing") %>%
               filter(price %in% max(price)) %>%
               dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum), 
             caption = "Transactions made in Albert's Fine Clothing")
Table 10: Transactions made in Albert’s Fine Clothing
location timestamp date hour price median_price last4ccnum loyaltynum
Albert’s Fine Clothing 2014-01-17 19:44:00 2014-01-17 19 1239.41 211.47 1321 L4149

Hence, when we plot out the stationary locations on the map (on 17 Jan, between 1900-2000), we are able to observe that Car ID 11 left the clothing store at 19:46, which matched up to the suspicious transaction time. Hence, it can be implied that the owner of Car ID 11 might hold CC 1321 and L4149.

POI_sf$hour = hour(POI_sf$Timestamp)

POI_clothing <- POI_sf %>%
  filter(day == 17 & hour == 19)
tmap_mode("view")

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(POI_clothing) +
  tm_dots(col = "blue")

Similarly, we can see that the dubious transaction at Chostus Hotel occurred on 18 Jan at 12:03. The credit card in question is 5010, tied to a loyalty card L2459.

knitr::kable(spending_by_location %>%
               filter(location %in% "Chostus Hotel") %>%
               filter(price %in% max(price)) %>%
               dplyr::select(location, timestamp, date, hour, price, median_price, last4ccnum, loyaltynum), 
             caption = "Transactions made in Chostus Hotel")
Table 11: Transactions made in Chostus Hotel
location timestamp date hour price median_price last4ccnum loyaltynum
Chostus Hotel 2014-01-18 12:03:00 2014-01-18 12 600 114.22 5010 L2459

Plotting the stationary points on the map, we can see that Car ID 31 left the hotel at 12:35, which matches up with the transaction data. Hence, it is implied that the owner of Car ID 31 might be the holder of CC 5010.

POI_hotel <- POI_sf %>%
  filter(day == 18 & hour == 12)
tmap_mode("view")

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(POI_hotel) +
  tm_dots(col = "blue")